Overview

Dataset statistics

Number of variables28
Number of observations29965
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.4 MiB
Average record size in memory224.0 B

Variable types

NUM21
BOOL6
CAT1

Reproduction

Analysis started2020-05-10 18:45:08.164282
Analysis finished2020-05-10 18:47:30.476996
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh Correlation
BILL_AMT1 is highly correlated with BILL_AMT2High Correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh Correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh Correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
PAY_AMT2 is highly skewed (γ1 = 30.43861292) Skewed
PAY_0 has 23169 (77.3%) zeros Zeros
PAY_2 has 25528 (85.2%) zeros Zeros
PAY_3 has 25753 (85.9%) zeros Zeros
PAY_4 has 26456 (88.3%) zeros Zeros
PAY_5 has 26998 (90.1%) zeros Zeros
PAY_6 has 26887 (89.7%) zeros Zeros
BILL_AMT1 has 1978 (6.6%) zeros Zeros
BILL_AMT2 has 2476 (8.3%) zeros Zeros
BILL_AMT3 has 2840 (9.5%) zeros Zeros
BILL_AMT4 has 3165 (10.6%) zeros Zeros
BILL_AMT5 has 3476 (11.6%) zeros Zeros
BILL_AMT6 has 3990 (13.3%) zeros Zeros
PAY_AMT1 has 5218 (17.4%) zeros Zeros
PAY_AMT2 has 5365 (17.9%) zeros Zeros
PAY_AMT3 has 5937 (19.8%) zeros Zeros
PAY_AMT4 has 6377 (21.3%) zeros Zeros
PAY_AMT5 has 6672 (22.3%) zeros Zeros
PAY_AMT6 has 7142 (23.8%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count29965
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14993.93252127482
Minimum0
Maximum29999
Zeros1
Zeros (%)< 0.1%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile1498.2
Q17496
median14991
Q322493
95-th percentile28495.8
Maximum29999
Range29999
Interquartile range (IQR)14997

Descriptive statistics

Standard deviation8659.328323
Coefficient of variation (CV)0.5775221618
Kurtosis-1.199841197
Mean14993.93252
Median Absolute Deviation (MAD)7499.008832
Skewness0.0005462130641
Sum449293188
Variance74983967.01
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 29999.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
1322 1 < 0.1%
 
15629 1 < 0.1%
 
9486 1 < 0.1%
 
11535 1 < 0.1%
 
21792 1 < 0.1%
 
23841 1 < 0.1%
 
17698 1 < 0.1%
 
19747 1 < 0.1%
 
29988 1 < 0.1%
 
Other values (29955) 29955 > 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
ValueCountFrequency (%) 
29999 1 < 0.1%
 
29998 1 < 0.1%
 
29997 1 < 0.1%
 
29996 1 < 0.1%
 
29995 1 < 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

Distinct count81
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167442.00500584015
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129760.1352
Coefficient of variation (CV)0.7749556942
Kurtosis0.5375871217
Mean167442.005
Median Absolute Deviation (MAD)104968.4634
Skewness0.9934913272
Sum5017399680
Variance1.683769269e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10000. 13000. 18000. 25000. 35000. ... 505000. 525000. 645000. 755000. 1000000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50000 3363 11.2%
 
20000 1975 6.6%
 
30000 1610 5.4%
 
80000 1564 5.2%
 
200000 1524 5.1%
 
150000 1107 3.7%
 
100000 1047 3.5%
 
180000 993 3.3%
 
360000 874 2.9%
 
60000 825 2.8%
 
Other values (71) 15083 50.3%
 
ValueCountFrequency (%) 
10000 493 1.6%
 
16000 2 < 0.1%
 
20000 1975 6.6%
 
30000 1610 5.4%
 
40000 230 0.8%
 
ValueCountFrequency (%) 
1000000 1 < 0.1%
 
800000 2 < 0.1%
 
780000 2 < 0.1%
 
760000 1 < 0.1%
 
750000 4 < 0.1%
 

SEX
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
18091
1
11874
ValueCountFrequency (%) 
0 18091 60.4%
 
1 11874 39.6%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
2
15945
1
13643
3
 
323
0
 
54
ValueCountFrequency (%) 
2 15945 53.2%
 
1 13643 45.5%
 
3 323 1.1%
 
0 54 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

AGE
Real number (ℝ≥0)

Distinct count56
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.487969297513764
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.219459233
Coefficient of variation (CV)0.2597911184
Kurtosis0.04398801494
Mean35.4879693
Median Absolute Deviation (MAD)7.547040921
Skewness0.7320560019
Sum1063397
Variance84.99842855
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 23.5 26.5 ... 58.5 61.5 66.5 70.5 79. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
29 1602 5.3%
 
27 1475 4.9%
 
28 1406 4.7%
 
30 1394 4.7%
 
26 1252 4.2%
 
31 1213 4.0%
 
25 1185 4.0%
 
34 1161 3.9%
 
32 1157 3.9%
 
33 1146 3.8%
 
Other values (46) 16974 56.6%
 
ValueCountFrequency (%) 
21 67 0.2%
 
22 560 1.9%
 
23 930 3.1%
 
24 1126 3.8%
 
25 1185 4.0%
 
ValueCountFrequency (%) 
79 1 < 0.1%
 
75 3 < 0.1%
 
74 1 < 0.1%
 
73 4 < 0.1%
 
72 3 < 0.1%
 

PAY_0
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.35641581845486403
Minimum0
Maximum8
Zeros23169
Zeros (%)77.3%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7607521242
Coefficient of variation (CV)2.134451068
Kurtosis12.46809493
Mean0.3564158185
Median Absolute Deviation (MAD)0.5511628966
Skewness2.811995522
Sum10680
Variance0.5787437945
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 4.5 5.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 23169 77.3%
 
1 3667 12.2%
 
2 2666 8.9%
 
3 322 1.1%
 
4 76 0.3%
 
5 26 0.1%
 
8 19 0.1%
 
6 11 < 0.1%
 
7 9 < 0.1%
 
ValueCountFrequency (%) 
0 23169 77.3%
 
1 3667 12.2%
 
2 2666 8.9%
 
3 322 1.1%
 
4 76 0.3%
 
ValueCountFrequency (%) 
8 19 0.1%
 
7 9 < 0.1%
 
6 11 < 0.1%
 
5 26 0.1%
 
4 76 0.3%
 

PAY_2
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.32034039712998497
Minimum0
Maximum8
Zeros25528
Zeros (%)85.2%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8020642448
Coefficient of variation (CV)2.503787384
Kurtosis7.828288807
Mean0.3203403971
Median Absolute Deviation (MAD)0.5458134262
Skewness2.597303525
Sum9599
Variance0.6433070527
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 4.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 25528 85.2%
 
2 3926 13.1%
 
3 326 1.1%
 
4 99 0.3%
 
1 28 0.1%
 
5 25 0.1%
 
7 20 0.1%
 
6 12 < 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
0 25528 85.2%
 
1 28 0.1%
 
2 3926 13.1%
 
3 326 1.1%
 
4 99 0.3%
 
ValueCountFrequency (%) 
8 1 < 0.1%
 
7 20 0.1%
 
6 12 < 0.1%
 
5 25 0.1%
 
4 99 0.3%
 

PAY_3
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3042883363924579
Minimum0
Maximum8
Zeros25753
Zeros (%)85.9%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7906960258
Coefficient of variation (CV)2.598509148
Kurtosis10.46442158
Mean0.3042883364
Median Absolute Deviation (MAD)0.5230327066
Skewness2.854576145
Sum9118
Variance0.6252002052
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 4.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 25753 85.9%
 
2 3819 12.7%
 
3 240 0.8%
 
4 75 0.3%
 
7 27 0.1%
 
6 23 0.1%
 
5 21 0.1%
 
1 4 < 0.1%
 
8 3 < 0.1%
 
ValueCountFrequency (%) 
0 25753 85.9%
 
1 4 < 0.1%
 
2 3819 12.7%
 
3 240 0.8%
 
4 75 0.3%
 
ValueCountFrequency (%) 
8 3 < 0.1%
 
7 27 0.1%
 
6 23 0.1%
 
5 21 0.1%
 
4 75 0.3%
 

PAY_4
Real number (ℝ≥0)

ZEROS
Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2589354246621058
Minimum0
Maximum8
Zeros26456
Zeros (%)88.3%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7612003542
Coefficient of variation (CV)2.939730457
Kurtosis17.15228434
Mean0.2589354247
Median Absolute Deviation (MAD)0.4572264705
Skewness3.545355069
Sum7759
Variance0.5794259792
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 26456 88.3%
 
2 3159 10.5%
 
3 180 0.6%
 
4 68 0.2%
 
7 58 0.2%
 
5 35 0.1%
 
6 5 < 0.1%
 
8 2 < 0.1%
 
1 2 < 0.1%
 
ValueCountFrequency (%) 
0 26456 88.3%
 
1 2 < 0.1%
 
2 3159 10.5%
 
3 180 0.6%
 
4 68 0.2%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
7 58 0.2%
 
6 5 < 0.1%
 
5 35 0.1%
 
4 68 0.2%
 

PAY_5
Real number (ℝ≥0)

ZEROS
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22162522943434007
Minimum0
Maximum8
Zeros26998
Zeros (%)90.1%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.717768155
Coefficient of variation (CV)3.238657245
Kurtosis21.30683323
Mean0.2216252294
Median Absolute Deviation (MAD)0.399361785
Skewness3.965041328
Sum6641
Variance0.5151911244
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 2.5 3.5 4.5 5.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 26998 90.1%
 
2 2626 8.8%
 
3 178 0.6%
 
4 83 0.3%
 
7 58 0.2%
 
5 17 0.1%
 
6 4 < 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
0 26998 90.1%
 
2 2626 8.8%
 
3 178 0.6%
 
4 83 0.3%
 
5 17 0.1%
 
ValueCountFrequency (%) 
8 1 < 0.1%
 
7 58 0.2%
 
6 4 < 0.1%
 
5 17 0.1%
 
4 83 0.3%
 

PAY_6
Real number (ℝ≥0)

ZEROS
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22669781411646922
Minimum0
Maximum8
Zeros26887
Zeros (%)89.7%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7154828373
Coefficient of variation (CV)3.156108232
Kurtosis19.92946349
Mean0.2266978141
Median Absolute Deviation (MAD)0.4068229019
Skewness3.81971703
Sum6793
Variance0.5119156905
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1. 2.5 3.5 4.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 26887 89.7%
 
2 2766 9.2%
 
3 184 0.6%
 
4 48 0.2%
 
7 46 0.2%
 
6 19 0.1%
 
5 13 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
0 26887 89.7%
 
2 2766 9.2%
 
3 184 0.6%
 
4 48 0.2%
 
5 13 < 0.1%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
7 46 0.2%
 
6 19 0.1%
 
5 13 < 0.1%
 
4 48 0.2%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22723
Unique (%)75.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51283.00977807442
Minimum-165580
Maximum964511
Zeros1978
Zeros (%)6.6%
Memory size234.2 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13595
median22438
Q367260
95-th percentile201303.8
Maximum964511
Range1130091
Interquartile range (IQR)63665

Descriptive statistics

Standard deviation73658.1324
Coefficient of variation (CV)1.436306736
Kurtosis9.796846218
Mean51283.00978
Median Absolute Deviation (MAD)50524.59367
Skewness2.662513456
Sum1536695388
Variance5425520469
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-165580. -14847. -6352.5 -2223. -1032.5 ... 311489. 390634.5 509866. 641760. 964511. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1978 6.6%
 
390 243 0.8%
 
780 76 0.3%
 
326 72 0.2%
 
316 63 0.2%
 
2500 59 0.2%
 
396 48 0.2%
 
2400 39 0.1%
 
416 29 0.1%
 
1050 25 0.1%
 
Other values (22713) 27333 91.2%
 
ValueCountFrequency (%) 
-165580 1 < 0.1%
 
-154973 1 < 0.1%
 
-15308 1 < 0.1%
 
-14386 1 < 0.1%
 
-11545 1 < 0.1%
 
ValueCountFrequency (%) 
964511 1 < 0.1%
 
746814 1 < 0.1%
 
653062 1 < 0.1%
 
630458 1 < 0.1%
 
626648 1 < 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22346
Unique (%)74.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49236.36629400968
Minimum-69777
Maximum983931
Zeros2476
Zeros (%)8.3%
Memory size234.2 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q13010
median21295
Q364109
95-th percentile194889.6
Maximum983931
Range1053708
Interquartile range (IQR)61099

Descriptive statistics

Standard deviation71195.56739
Coefficient of variation (CV)1.445995567
Kurtosis10.29321199
Mean49236.36629
Median Absolute Deviation (MAD)48694.20765
Skewness2.70386174
Sum1475367716
Variance5068808816
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-6.97770e+04 -9.48450e+03 -2.97650e+03 -1.04700e+03 -4.22500e+02 ... 3.24529e+05 4.00555e+05 5.12588e+05 6.01868e+05 9.83931e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2476 8.3%
 
390 230 0.8%
 
326 75 0.3%
 
780 75 0.3%
 
316 72 0.2%
 
2500 51 0.2%
 
396 50 0.2%
 
2400 42 0.1%
 
-200 29 0.1%
 
416 28 0.1%
 
Other values (22336) 26837 89.6%
 
ValueCountFrequency (%) 
-69777 1 < 0.1%
 
-67526 1 < 0.1%
 
-33350 1 < 0.1%
 
-30000 1 < 0.1%
 
-26214 1 < 0.1%
 
ValueCountFrequency (%) 
983931 1 < 0.1%
 
743970 1 < 0.1%
 
671563 1 < 0.1%
 
646770 1 < 0.1%
 
624475 1 < 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22026
Unique (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47067.91606874687
Minimum-157264
Maximum1664089
Zeros2840
Zeros (%)9.5%
Memory size234.2 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12711
median20135
Q360201
95-th percentile187901
Maximum1664089
Range1821353
Interquartile range (IQR)57490

Descriptive statistics

Standard deviation69371.35232
Coefficient of variation (CV)1.473856464
Kurtosis19.77100256
Mean47067.91607
Median Absolute Deviation (MAD)46893.72568
Skewness3.086493832
Sum1410390105
Variance4812384523
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.572640e+05 -1.680800e+04 -5.286500e+03 -2.802000e+03 -1.065000e+03 ... 3.080855e+05 3.956640e+05 4.993875e+05 5.881930e+05 1.664089e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2840 9.5%
 
390 274 0.9%
 
780 74 0.2%
 
326 63 0.2%
 
316 62 0.2%
 
396 47 0.2%
 
2500 40 0.1%
 
2400 39 0.1%
 
416 29 0.1%
 
200 27 0.1%
 
Other values (22016) 26470 88.3%
 
ValueCountFrequency (%) 
-157264 1 < 0.1%
 
-61506 1 < 0.1%
 
-46127 1 < 0.1%
 
-34041 1 < 0.1%
 
-25443 1 < 0.1%
 
ValueCountFrequency (%) 
1664089 1 < 0.1%
 
855086 1 < 0.1%
 
693131 1 < 0.1%
 
689643 1 < 0.1%
 
689627 1 < 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21548
Unique (%)71.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43313.32988486568
Minimum-170000
Maximum891586
Zeros3165
Zeros (%)10.6%
Memory size234.2 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12360
median19081
Q354601
95-th percentile174469.8
Maximum891586
Range1061586
Interquartile range (IQR)52241

Descriptive statistics

Standard deviation64353.51437
Coefficient of variation (CV)1.485766958
Kurtosis11.29858229
Mean43313.32988
Median Absolute Deviation (MAD)43658.56894
Skewness2.820544832
Sum1297883930
Variance4141374812
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-170000. -25896.5 -6121.5 -2976.5 -1570. ... 320757.5 390539. 489393. 570919.5 891586. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3165 10.6%
 
390 245 0.8%
 
780 101 0.3%
 
316 68 0.2%
 
326 62 0.2%
 
396 43 0.1%
 
150 39 0.1%
 
2400 39 0.1%
 
2500 34 0.1%
 
1000 33 0.1%
 
Other values (21538) 26136 87.2%
 
ValueCountFrequency (%) 
-170000 1 < 0.1%
 
-81334 1 < 0.1%
 
-65167 1 < 0.1%
 
-50616 1 < 0.1%
 
-46627 1 < 0.1%
 
ValueCountFrequency (%) 
891586 1 < 0.1%
 
706864 1 < 0.1%
 
628699 1 < 0.1%
 
616836 1 < 0.1%
 
572805 1 < 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21010
Unique (%)70.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40358.33439012181
Minimum-81334
Maximum927171
Zeros3476
Zeros (%)11.6%
Memory size234.2 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11787
median18130
Q350247
95-th percentile165805.6
Maximum927171
Range1008505
Interquartile range (IQR)48460

Descriptive statistics

Standard deviation60817.13062
Coefficient of variation (CV)1.506928657
Kurtosis12.29453891
Mean40358.33439
Median Absolute Deviation (MAD)41230.73164
Skewness2.874925049
Sum1209337490
Variance3698723377
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-81334. -10657.5 -5042. -1981.5 -1003. ... 265940. 311764. 370768. 520227. 927171. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3476 11.6%
 
390 234 0.8%
 
780 94 0.3%
 
316 79 0.3%
 
326 62 0.2%
 
150 58 0.2%
 
396 46 0.2%
 
2400 39 0.1%
 
2500 37 0.1%
 
416 36 0.1%
 
Other values (21000) 25804 86.1%
 
ValueCountFrequency (%) 
-81334 1 < 0.1%
 
-61372 1 < 0.1%
 
-53007 1 < 0.1%
 
-46627 1 < 0.1%
 
-37594 1 < 0.1%
 
ValueCountFrequency (%) 
927171 1 < 0.1%
 
823540 1 < 0.1%
 
587067 1 < 0.1%
 
551702 1 < 0.1%
 
547880 1 < 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count20604
Unique (%)68.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38917.012280994495
Minimum-339603
Maximum961664
Zeros3990
Zeros (%)13.3%
Memory size234.2 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11262
median17124
Q349252
95-th percentile161932
Maximum961664
Range1301267
Interquartile range (IQR)47990

Descriptive statistics

Standard deviation59574.14774
Coefficient of variation (CV)1.530799623
Kurtosis12.25912611
Mean38917.01228
Median Absolute Deviation (MAD)40401.32034
Skewness2.845137169
Sum1166148273
Variance3549079079
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-339603. -54251.5 -24295. -6106. -3020. ... 311963.5 365432.5 439967.5 527638.5 961664. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3990 13.3%
 
390 206 0.7%
 
780 86 0.3%
 
150 78 0.3%
 
316 77 0.3%
 
326 56 0.2%
 
396 44 0.1%
 
416 36 0.1%
 
-18 33 0.1%
 
2400 32 0.1%
 
Other values (20594) 25327 84.5%
 
ValueCountFrequency (%) 
-339603 1 < 0.1%
 
-209051 1 < 0.1%
 
-150953 1 < 0.1%
 
-94625 1 < 0.1%
 
-73895 1 < 0.1%
 
ValueCountFrequency (%) 
961664 1 < 0.1%
 
699944 1 < 0.1%
 
568638 1 < 0.1%
 
527711 1 < 0.1%
 
527566 1 < 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS
Distinct count7943
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5670.099315868513
Minimum0
Maximum873552
Zeros5218
Zeros (%)17.4%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2102
Q35008
95-th percentile18447.2
Maximum873552
Range873552
Interquartile range (IQR)4008

Descriptive statistics

Standard deviation16571.84947
Coefficient of variation (CV)2.92267358
Kurtosis414.8548633
Mean5670.099316
Median Absolute Deviation (MAD)5926.422933
Skewness14.66159454
Sum169904526
Variance274626194.7
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 6.500000e+00 1.750000e+01 1.635000e+02 ... 1.000740e+05 1.017880e+05 1.647010e+05 3.034075e+05 8.735520e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5218 17.4%
 
2000 1363 4.5%
 
3000 891 3.0%
 
5000 698 2.3%
 
1500 507 1.7%
 
4000 426 1.4%
 
10000 401 1.3%
 
1000 365 1.2%
 
2500 298 1.0%
 
6000 294 1.0%
 
Other values (7933) 19504 65.1%
 
ValueCountFrequency (%) 
0 5218 17.4%
 
1 9 < 0.1%
 
2 14 < 0.1%
 
3 15 0.1%
 
4 18 0.1%
 
ValueCountFrequency (%) 
873552 1 < 0.1%
 
505000 1 < 0.1%
 
493358 1 < 0.1%
 
423903 1 < 0.1%
 
405016 1 < 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count7899
Unique (%)26.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5927.983180377107
Minimum0
Maximum1684259
Zeros5365
Zeros (%)17.9%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1850
median2010
Q35000
95-th percentile19030.8
Maximum1684259
Range1684259
Interquartile range (IQR)4150

Descriptive statistics

Standard deviation23053.45664
Coefficient of variation (CV)3.888920724
Kurtosis1639.924451
Mean5927.98318
Median Absolute Deviation (MAD)6483.551092
Skewness30.43861292
Sum177632016
Variance531461863.3
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 5.500000e+00 1.650000e+01 3.050000e+01 ... 1.000805e+05 1.500855e+05 2.066760e+05 4.082775e+05 1.684259e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5365 17.9%
 
2000 1290 4.3%
 
3000 857 2.9%
 
5000 717 2.4%
 
1000 594 2.0%
 
1500 521 1.7%
 
4000 410 1.4%
 
10000 318 1.1%
 
6000 283 0.9%
 
2500 251 0.8%
 
Other values (7889) 19359 64.6%
 
ValueCountFrequency (%) 
0 5365 17.9%
 
1 15 0.1%
 
2 20 0.1%
 
3 18 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
1684259 1 < 0.1%
 
1227082 1 < 0.1%
 
1215471 1 < 0.1%
 
1024516 1 < 0.1%
 
580464 1 < 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS
Distinct count7518
Unique (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5231.688836976473
Minimum0
Maximum896040
Zeros5937
Zeros (%)19.8%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1804
Q34512
95-th percentile17602.6
Maximum896040
Range896040
Interquartile range (IQR)4122

Descriptive statistics

Standard deviation17616.36112
Coefficient of variation (CV)3.367241759
Kurtosis563.7392771
Mean5231.688837
Median Absolute Deviation (MAD)5870.503512
Skewness17.2081766
Sum156767556
Variance310336179.3
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 1.250000e+01 3.450000e+01 1.495000e+02 ... 1.000895e+05 1.642195e+05 2.376245e+05 4.092800e+05 8.960400e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5937 19.8%
 
2000 1285 4.3%
 
1000 1103 3.7%
 
3000 870 2.9%
 
5000 721 2.4%
 
1500 490 1.6%
 
4000 381 1.3%
 
10000 312 1.0%
 
1200 243 0.8%
 
6000 241 0.8%
 
Other values (7508) 18382 61.3%
 
ValueCountFrequency (%) 
0 5937 19.8%
 
1 13 < 0.1%
 
2 19 0.1%
 
3 14 < 0.1%
 
4 15 0.1%
 
ValueCountFrequency (%) 
896040 1 < 0.1%
 
889043 1 < 0.1%
 
508229 1 < 0.1%
 
417588 1 < 0.1%
 
400972 1 < 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS
Distinct count6937
Unique (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4831.617453695979
Minimum0
Maximum621000
Zeros6377
Zeros (%)21.3%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1300
median1500
Q34016
95-th percentile16037
Maximum621000
Range621000
Interquartile range (IQR)3716

Descriptive statistics

Standard deviation15674.46454
Coefficient of variation (CV)3.244144365
Kurtosis277.0486932
Mean4831.617454
Median Absolute Deviation (MAD)5536.713417
Skewness12.89850649
Sum144779417
Variance245688838.5
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 6.50000e+00 1.85000e+01 9.95000e+01 ... 1.00052e+05 1.24744e+05 2.03538e+05 3.31385e+05 6.21000e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6377 21.3%
 
1000 1394 4.7%
 
2000 1214 4.1%
 
3000 887 3.0%
 
5000 810 2.7%
 
1500 441 1.5%
 
4000 402 1.3%
 
10000 341 1.1%
 
2500 259 0.9%
 
500 258 0.9%
 
Other values (6927) 17582 58.7%
 
ValueCountFrequency (%) 
0 6377 21.3%
 
1 22 0.1%
 
2 22 0.1%
 
3 13 < 0.1%
 
4 20 0.1%
 
ValueCountFrequency (%) 
621000 1 < 0.1%
 
528897 1 < 0.1%
 
497000 1 < 0.1%
 
432130 1 < 0.1%
 
400046 1 < 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS
Distinct count6897
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4804.897046554313
Minimum0
Maximum426529
Zeros6672
Zeros (%)22.3%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1261
median1500
Q34042
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3781

Descriptive statistics

Standard deviation15286.3723
Coefficient of variation (CV)3.181415158
Kurtosis179.8752095
Mean4804.897047
Median Absolute Deviation (MAD)5486.074532
Skewness11.12174174
Sum143978740
Variance233673178
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 2.350000e+01 9.950000e+01 ... 9.991100e+04 1.000720e+05 1.100710e+05 2.153995e+05 4.265290e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6672 22.3%
 
1000 1340 4.5%
 
2000 1323 4.4%
 
3000 947 3.2%
 
5000 814 2.7%
 
1500 426 1.4%
 
4000 401 1.3%
 
10000 343 1.1%
 
500 250 0.8%
 
6000 247 0.8%
 
Other values (6887) 17202 57.4%
 
ValueCountFrequency (%) 
0 6672 22.3%
 
1 21 0.1%
 
2 13 < 0.1%
 
3 13 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
426529 1 < 0.1%
 
417990 1 < 0.1%
 
388071 1 < 0.1%
 
379267 1 < 0.1%
 
332000 1 < 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS
Distinct count6939
Unique (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5221.498014350075
Minimum0
Maximum528666
Zeros7142
Zeros (%)23.8%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1131
median1500
Q34000
95-th percentile17384.4
Maximum528666
Range528666
Interquartile range (IQR)3869

Descriptive statistics

Standard deviation17786.97686
Coefficient of variation (CV)3.406489252
Kurtosis166.9817897
Mean5221.498014
Median Absolute Deviation (MAD)6204.397746
Skewness10.63509397
Sum156462188
Variance316376546
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 1.850000e+01 9.950000e+01 ... 1.000195e+05 1.223750e+05 2.013000e+05 2.889910e+05 5.286660e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 7142 23.8%
 
1000 1299 4.3%
 
2000 1295 4.3%
 
3000 914 3.1%
 
5000 808 2.7%
 
1500 439 1.5%
 
4000 411 1.4%
 
10000 356 1.2%
 
500 247 0.8%
 
6000 220 0.7%
 
Other values (6929) 16834 56.2%
 
ValueCountFrequency (%) 
0 7142 23.8%
 
1 20 0.1%
 
2 9 < 0.1%
 
3 14 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
528666 1 < 0.1%
 
527143 1 < 0.1%
 
443001 1 < 0.1%
 
422000 1 < 0.1%
 
403500 1 < 0.1%
 

default
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
1
23335
0
6630
ValueCountFrequency (%) 
1 23335 77.9%
 
0 6630 22.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
19402
1
10563
ValueCountFrequency (%) 
0 19402 64.7%
 
1 10563 35.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
25050
1
 
4915
ValueCountFrequency (%) 
0 25050 83.6%
 
1 4915 16.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
29497
1
 
468
ValueCountFrequency (%) 
0 29497 98.4%
 
1 468 1.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
15946
1
14019
ValueCountFrequency (%) 
0 15946 53.2%
 
1 14019 46.8%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexLIMIT_BALSEXMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6defaultEDUCATION_graduate schoolEDUCATION_high schoolEDUCATION_otherEDUCATION_university
00200000124220000391331026890000689000000001
11120000022602000226821725268232723455326101000100010000200000001
2290000023400000029239140271355914331149481554915181500100010001000500010001
3350000013700000046990482334929128314289592954720002019120011001069100010001
44500001157000000861756703583520940191461913120003668110000900068967910001
55500001237000000644005706957608193941961920024250018156571000100080011000
66500000122900000036796541202344500754265348300347394455000400003800020239137501377011000
77100000022300000011876380601221-15956738060105811687154210001
881400000128002000112851409612108122111179337193329043210001000100010100
9920000123500000000001300713912000130071122010100

Last rows

df_indexLIMIT_BALSEXMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6defaultEDUCATION_graduate schoolEDUCATION_high schoolEDUCATION_otherEDUCATION_university
29955299901400001141000000138325137142139110138262496754612160007000422815052000200010001
2995629991210000113432222225002500250025002500250000000000001
2995729992100001143000000880210400000020000000010100
29958299931000001238000000304214271029967062669473550042000111784400030002000200011000
299592999480000123422222272557777087938477519826078115870003500070000400000001
29960299952200001139000000188948192815208365880043123715980850020000500330475000100010100
299612999615000012430000001683182835028979519001837352689981290010100
2996229997300001237432000356533562758208782058219357002200042002000310000001
2996329998800001141100000-164578379763045277411855489448590034091178192652964180400100
299642999950000114600000047929489054976436535324281531320781800143010001000100000001